UKN
Dr. Peter Bak, University of Konstanz, bak@dbvis.inf.uni-konstanz.de [PRIMARY contact]
Patrick Jungk, University of Konstanz, patrick.jungk@uni-konstanz.de [lead development, analyst]
VAT – Video Analysing Tool
Developed at: University of Konstanz
by: Patrick Jungk
Version 1.0
Tool for
Movement Detection
Classification of Events
Pattern Detection
Visualisation of Pattern, Classified Events, Detected Movement
KNIME – Konstanz Information Miner
Developed at: University of Konstanz
by: KNIME CORE TEAM
Version 2.0.3
KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models. […]
Video:
ANSWERS:
Short Answer
Figure
1: Analysis process of video data
In order to sufficiently reduce the data for a successful information extraction the following basic steps are indispensable:
Movement Detection
Location Determination
Classification of moving areas
Automatic Classification Prediction
Filtering methods
Behaviour Pattern Detection
A base concept aims at determining events by detecting moving objects, surrounding them with bounding boxes for classifying them. Afterwards suspicious
behavioural patterns can be automatically detected and manually verified.
Figure
1: Detection of possible events requires data reduction (part below
shows the determined patterns)
The retrieved data can be evaluated quickly by visualizing the relevant
patterns. The initial data can be reduced to less than 1%. Resulting in decreasing the expenditure of human labour regarding
long lasting videos with more than 1h.
MC3.1: Provide a tab-delimitated table containing the location, start time and duration of the events identified above. Please name the file Video.txt and place it in the same directory as your index.htm file. Please see the format required in the Task Descriptions.
MC3.2: Identify any events of potential counterintelligence/espionage interest in the video. Provide a Detailed Answer, including a description of any activities, and why the event is of interest.
….2.3 Determination of bounding boxes
….2.4 Classification of Bounding Boxes
3 Determination of Suspicious Events
….4.2 Performance Comparison of Automatic and Interactive Parts
List of figures
Figure 1: Classification needs interactive user involvement
Figure 3: Analysis process of video data
Figure 4: Classification needs user interaction and computing using prediction algorithms
Figure 5: patterns can be verified manually and marked to be exported to a result table
List of Tables
Table 1: Mapped colours to the classified bounding boxes for visualisation
Table 2: comparison of user and hardware process times for video 1
Table 3: Data reduction of video one leads to relevant events
To identify any events of potential counter intelligence/espionage interest a definition of such an suspicious event needs to be given. Following events were defined as suspicious:
one person dropping an item, another person picking up an item
two persons meeting
one person waiting for another
a person is close to a car
two cars stopping next to each other
These events need to be describe in a formal way with behavioural pattern. To recognize such an event following dimensions are to be considered as well:
items
areas
Suspicious item are as followed:
humans
cars
Suspicious areas are as followed:
behind hiding objects (trees, etc.)
on the pavement
at parking areas
at the pavement/street border
Those areas specify the areas of interest. In order to determine events, items within an area have to be recognized. Therefore, areas of movement within the video need to be recognized since every moving object may be indicating suspicious deeds. Those areas of movement are to be marked and classified, see figure 1.
Following types are considered as potential suspicious and need to be determined.
one human
one vehicle
two humans close to each other
one human being close to one vehicle
All other moving areas are not to be considered as suspicious. This means those areas are irrelevant and can be excluded.
To analyse the video data, an interactive process based on the KDD (Knowledge Discovery in Databases) - pipeline (Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data Mining. (1996), 1-34.) was used as shown in figure 4.
Figure
2: KDD pipeline (Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P.
From Data Mining to Knowledge Discovery: An Overview. Advances in
Knowledge Discovery and Data Mining. (1996), 1-34.,
http://www.aaai.org/aitopics/assets/PDF/AIMag17-03-2-article.pdf)
Following this terminology, a flow chart was created describing the operative steps required to conduct a successful analysis of video stream data.
Figure
3: Analysis process of video data
In order to extract the data from the video, thresholds have to be set manually by the user. The most important thresholds are as follows.
count of frames to skip, to calculate the difference between 2 frames
minimum size of a bounding box (area inside the bounding box) to suppress noise
threshold to recognize position change in pixels
2.1 Determination of bounding boxes
As the result of the determination chain the bounding boxes inside a frame are determined. Figure 1 shows the result of the automatic determination of bounding boxes. The colour bar below the frame preview shows the count of bounding boxes over the time, each line stands for one location, starting with the first one. The lighter the colour is, the more bounding boxes were found in this time slot. Each movement of the camera position indicates a change of location. This leads to the location information relevant for the result.
2.2 Classification of Bounding Boxes
The process of classification is an interactive process divided into two sub processes.
manual classification of a subset of bounding boxes (training)
automatic classification of the remaining bounding boxes using Neural Networks Algorithm (Multi Layer Perception Predictor) or Decision Tree Predictor
No. |
name |
color |
R |
G |
B |
1 |
human |
green |
77 |
157 |
74 |
2 |
Two humans |
orange |
255 |
127 |
0 |
3 |
car |
red |
228 |
26 |
28 |
4 |
Two cars |
blue |
126 |
126 |
184 |
Table 1: Mapped colours to the classified bounding boxes for visualisation
This trained data will be used for the next step (Multi Layer Perception Predictor, Decision Tree Predictor) as picture 9 shows. At the end of this step a table containing all bounding boxes of one sub-video results.
Figure
4: Classification needs user interaction and computing using
prediction algorithms
3 Determination of Suspicious Events
Once the patterns have been recognized, the suspicious events can be reviewed by visualisation of the patterns.
Figure
5: patterns can be verified manually and marked to be exported to a
result table
As a result the most relevant pattern is
1 person meeting another
This means also, that two persons may walk down a street together (implicates previous meeting).
The pattern:
1 person near car
needs to be redefined for another run since too many events were found.
4.2 Performance Comparison of Automatic and Interactive Parts
Performance is assessed for the interactive and automatic parts of the process chain. Process times for the user as well as for the hardware (server,pc) are listed separately in the table 2.
No. |
Process step |
time in Min (user) |
time in Min (HW) |
1 |
Frame Extraction |
0 |
180 - 360 |
2 |
Set up Thresholds |
5 - 15 |
5 - 15 |
3 |
Determination of bounding boxes |
0 |
180 - 240 |
4 |
Classifying of a subset of bounding boxes |
5 - 15 |
5 - 15 |
5 |
Filtering Boxes 10 |
<1 |
<1 |
6 |
Visualisation |
0 |
<1 |
7 |
Pattern Recognition |
0 |
<1 |
8 |
Pattern Recognition Review |
5 - 30 |
0 |
Table 2: comparison of user and hardware process times for video 1
Table 3 shows the reduction of data for video 1. The
final relevant data was reduced to 0,008 % of the potential relevant data.
No. |
Process step |
count of table rows input |
count of table rows output |
1 |
Frame Extraction |
0 |
0 |
2 |
Set up Thresholds |
0 |
0 |
3 |
Determination of bounding boxes |
0 |
143528 |
4 |
Classifying of a subset of bounding boxes |
143528 |
143528 |
5 |
Filtering bounding boxes |
143528 |
93865 |
6 |
Visualisation |
93865 |
93865 |
7 |
Pattern Recognition |
93865 |
3859 |
8 |
Pattern Recognition Review |
3859 |
12 |
Table 3: Data reduction of video one leads to relevant events
Compared to the complete video time (4h) the user interaction takes between 25 to 70 minutes. The VAT-Tool enables an analyst to focus her/his attention on a limited amount of automatically preselected events, while it would otherwise be very difficult and exhausting to attentively watch the whole videos with several hours of duration.